-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Preview] Add QMC_GPU=no/NVIDIA/AMD/INTEL #3930
Conversation
I would think we would want to skip the *_legacy options, since we eventually want to get rid of those. They would still be available by setting the underlying flags, but they don't need to be advertised in this option. We should clarify the difference between platform hardware and the programming model. CUDA is often used as synonymous with NVIDIA hardware, but the similarity of the HIP programming model complicates that in practice (the programming model for AMD GPU is basically looks like CUDA + translation to HIP + small amount of HIP-specific code) A top-level flag such as this should refer to hardware (NVIDIA, AMD, INTEL), and then set the programming model flags (ENABLE_CUDA, etc). |
Would it be reasonable to say our portability strategy seems to be converging on: OpenMP offload plus small amounts of "native" programming model code for interfacing with libraries (BLAS), higher performance kernels, and additional features.
The reason for specifying that we choose a "native" model for each platform is that there are may projects to provide portability between these models and hardware (e.g., sycl can target NVIDIA and AMD, a project retarget CUDA to Level Zero, etc. These projects have varying levels of maturity and support). We are unlikely to want to use any of these (?) |
I prefer avoid treating legacy special but I see your point of "not advising". So more opinions are well come.
I kind of think most users won't care such details and also it is not our job to educate all users nor they are willing to learn.
That is a very reasonable suggestion.
You can consider OpenMP runtime library itself a program and dynamically loads vendor APIs. It is not trivial and thus we won't do that. |
Remove legacy GPU implementation access from QMC_GPU. |
I think this makes sense. And I think its probably better for development as well. Builds away from the "suggested" require more flags but their intent reads clearly. I assume that we can write the cmake such that QMC_GPU sets the suggested set of internal flags and they get replaced if they are explicitly set.
Do we just leave legacy as QMC_CUDA and whatever collection of flags you use to compile it with HIP? |
I tend to avoid this. If developers need to explicitly control flags for GPU programming models, they know what they are doing. So just don't set If needed in the future, we can add NVIDIA_SYCL and NVIDIA_CUDA as fine-grain variant of NVIDIA. |
I think QMC_CUDA2HIP is just a terrible flag and without reading the cmake again I couldn't tell you what parts of the native CUDA code it effects. I know exactly what I expect if I turn OFFLOAD on and off with a particular GPU setting. I would suggest it not be referenced in the cmake except to set the internal flags and that it set them with the cache keyword so if they are specified on the command line they are not overwritten.
It will do what you want for users and be convenient for developers as well. |
I cannot remove QMC_CUDA2HIP cmake option if we need legacy CUDA converted HIP to keep working with
for the code above,
To make developer friendly, I can clear incompatible values in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the code above, -DQMC_GPU=NVIDIA -DENABLE_CUDA=OFF ends up with ENABLE_CUDA being OFF.
In my setting,
And that is exactly the behavior I want. The explicit option overrides the general. The enable flags remain cache variables that I can see in at least the advanced mode of ccmake.
The way it works now you set directly level variables for the ENABLE flags rudely remove cache variable ENABLE if I have them set. If you really can't deal I guess should can do set(... CACHE FORCE). But honestly a user should be doing a clean build so they should be just running cmake on a clean build directory and -DQMC_GPU=NVIDIA the way I suggested will be fine. They won't have ENABLE variables so it will be fine.
I meant |
Someone else should weigh in on this. It's an obscuring flag for development and troubleshooting since now the easy visibility of the ENABLE flags is removed. I think its best thought of as a way to set different defaults for the more specific build flags when you indicate a particular hardware. Leave the freedom to set the more specific flags and not have them overwritten and/or disappeared. Since users should be building once in a clean directory with only -QMC_GPU=<...> I don't see why the behavior I suggest is an issue. |
I will eventually weigh in properly here. Quick take: I think we should disambiguate between a single clean flag that the user specifies to build the code on the cmake line from what we do internally. Currently these seem to be conflated which, based on the discussion, is problematic. Obviously this is not how things are done today. |
Do not merge! Soliciting feedback.
Proposed changes
The current set of GPU flags are not user-friendly. They are still technically good because they are well defined and orthogonal each other.
So last night I was thinking if we can introduce a user friendly user facing CMake option while keep using technical flags internally.
This list can be expanded to include SYCL as well in the future. Users will be restricted to choose one only.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
epyc-server
Checklist